Efficient K-Nearest Neighbor Join Algorithms for High Dimensional Sparse Data
نویسندگان
چکیده
The K-Nearest Neighbor (KNN) join is an expensive but important operation in many data mining algorithms. Several recent applications need to perform KNN join for high dimensional sparse data. Unfortunately, all existing KNN join algorithms are designed for low dimensional data. To fulfill this void, we investigate the KNN join problem for high dimensional sparse data. In this paper, we propose three KNN join algorithms: a brute force (BF) algorithm, an inverted index-based(IIB) algorithm and an improved inverted index-based(IIIB) algorithm. Extensive experiments on both synthetic and real-world datasets were conducted to demonstrate the effectiveness of our algorithms for high dimensional sparse data.
منابع مشابه
High-dimensional kNN joins with incremental updates
The k Nearest Neighbor (kNN) join operation associates each data object in one data set with its k nearest neighbors from the same or a different data set. The kNN join on high-dimensional data (high-dimensional kNN join) is an especially expensive operation. Existing high-dimensional kNN join algorithms were designed for static data sets and therefore cannot handle updates efficiently. In this...
متن کاملEfficient index-based KNN join processing for high-dimensional data
In many advanced database applications (e.g., multimedia databases), data objects are transformed into high-dimensional points and manipulated in high-dimensional space. One of the most important but costly operations is the similarity join that combines similar points from multiple datasets. In this paper, we examine the problem of processing K-nearest neighbor similarity join (KNN join). KNN ...
متن کاملEfficient Processing of k Nearest Neighbor Joins using MapReduce
k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the join operation, kNN join is an expensive operation. Given the increasing volume of data, it is difficult to perform a kNN join on a centr...
متن کاملHigh Performance Data Mining Using the Nearest Neighbor Join
The similarity join has become an important database primitive to support similarity search and data mining. A similarity join combines two sets of complex objects such that the result contains all pairs of similar objects. Well-known are two types of the similarity join, the distance range join where the user defines a distance threshold for the join, and the closest point query or k-distance ...
متن کاملSupporting KDD Applications by the k-Nearest Neighbor Join
The similarity join has become an important database primitive to support similarity search and data mining. A similarity join combines two sets of complex objects such that the result contains all pairs of similar objects. Well-known are two types of the similarity join, the distance range join where the user defines a distance threshold for the join, and the closest point query or k-distance ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1011.2807 شماره
صفحات -
تاریخ انتشار 2010